Goto

Collaborating Authors

 hellinger distance


Neighbor Embedding for High-Dimensional Sparse Poisson Data

Mudrik, Noga, Charles, Adam S.

arXiv.org Machine Learning

Across many scientific fields, measurements often represent the number of times an event occurs. For example, a document can be represented by word occurrence counts, neural activity by spike counts per time window, or online communication by daily email counts. These measurements yield high-dimensional count data that often approximate a Poisson distribution, frequently with low rates that produce substantial sparsity and complicate downstream analysis. A useful approach is to embed the data into a low-dimensional space that preserves meaningful structure, commonly termed dimensionality reduction. Yet existing dimensionality reduction methods, including both linear (e.g., PCA) and nonlinear approaches (e.g., t-SNE), often assume continuous Euclidean geometry, thereby misaligning with the discrete, sparse nature of low-rate count data. Here, we propose p-SNE (Poisson Stochastic Neighbor Embedding), a nonlinear neighbor embedding method designed around the Poisson structure of count data, using KL divergence between Poisson distributions to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. We test p-SNE on synthetic Poisson data and demonstrate its ability to recover meaningful structure in real-world count datasets, including weekday patterns in email communication, research area clusters in OpenReview papers, and temporal drift and stimulus gradients in neural spike recordings.




A Additional definitions

Neural Information Processing Systems

We provide the definitions of important terms used throughout the paper. Assumption 2.3 when the demand distribution is exponential. Note that Lemma B.1 implies that In the following result, we show that there exist appropriate constants such that prior distribution satisfies Assumption 2.3 when the demand distribution is a multivariate Gaussian with unknown The proof is a direct consequence of Theorem 3.2, Lemmas B.6, B.7, B.8, B.9, and Proposition 3.2. Theorem 6.19] the prior induced by Assumption 2.2 is a direct consequence of Assumption 2.4 and 2.5 are straightforward to satisfy since the model risk function Lemma B.13. F or a given Using the result above together with Proposition 3.2 implies that the RSVB posterior converges at C.1 Alternative derivation of LCVB We present the alternative derivation of LCVB. We prove our main result after a series of important lemmas.





Universal Boosting Variational Inference

Trevor Campbell, Xinglong Li

Neural Information Processing Systems

But theguarantees have strong conditions that donot often hold inpractice, resulting indegenerate component optimization problems; and weshowthat the ad-hoc regularization used to prevent degeneracyin practice can cause BVI to fail in unintuitiveways.